Take-home-Ex2:Applied Spatial Interaction Models: A case study of Singapore public bus commuter flows

Author

Widya Tantiya Yutika

Published

December 8, 2023

Modified

December 15, 2023

1 Setting the Scene

Urban mobility challenges encompass understanding the driving forces behind early morning commutes for city dwellers and evaluating the impact of removing public bus services along specific routes, presenting complex issues for transport operators and urban managers. Traditional commuter surveys, though effective, are costly and time-consuming. With the digitization of urban infrastructures, particularly public transportation, massive geospatial data sets are generated through technologies like GPS and SMART cards. However, the inability to efficiently utilize this data can hinder effective decision-making. This exercise aims to address two key issues: the lack of research on integrating diverse open data sources for policymaking and the insufficient exploration of geospatial data science for decision support.

2 Objective

The objective is to conduct a case study showcasing the potential of geospatial data science in integrating publicly available data to build spatial interaction models, explaining factors influencing urban mobility patterns in public bus.

3 The Study Area and Data

3.1 Aspatial Data

  • Open Government Data

    • Passenger Volume by Origin Destination Bus Stops from LTA DataMall.

    • School Directory and Information from Data.gov.sg.

  • Instructor-curated Datasets for Educational Purpose

    • HDB: This data set is the geocoded version of HDB Property Information data from Data.gov.sg. The data set is prepared using September 2021 data.

3.2 Geospatial Data

  • Open Government Data

    • Bus Stop Location, Train Station and Train Station Exit Point from LTA DataMall.

    • Master Plan 2019 Subzone Boundary from Data.gov.sg.

  • Instructor-curated Datasets for Educational Purpose

    • Business, entertn, F&B, FinServ, Leisure&Recreation and Retails consisting locations of business establishments, entertainments, food and beverage outlets, financial centres, leisure and recreation centres, retail and services stores/outlets.

4 The Task

The specific tasks of this take-home exercise are as follows:

4.1 Geospatial Data Science

  • Derive an analytical hexagon data of 375m (this distance is the perpendicular distance between the centre of the hexagon and its edges) to represent the traffic analysis zone (TAZ).

  • With reference to the time intervals provided in the table below, construct an O-D matrix of commuter flows for a time interval of your choice by integrating Passenger Volume by Origin Destination Bus Stops and Bus Stop Location from LTA DataMall. The O-D matrix must be aggregated at the analytics hexagon level

    Peak hour period Bus tap on time
    Weekday morning peak 6am to 9am
    Weekday afternoon peak 5pm to 8pm
    Weekend/holiday morning peak 11am to 2pm
    Weekend/holiday evening peak 4pm to 7pm
  • Display the O-D flows of the passenger trips by using appropriate geovisualisation methods (not more than 5 maps).

  • Describe the spatial patterns revealed by the geovisualisation (not more than 100 words per visual).

  • Assemble at least three propulsive and three attractiveness variables by using aspatial and geospatial from publicly available sources.

  • Compute a distance matrix by using the analytical hexagon data derived earlier.

4.2 Spatial Interaction Modelling

  • Calibrate spatial interactive models to determine factors affecting urban commuting flows at the selected time interval.

  • Present the modelling results by using appropriate geovisualisation and graphical visualisation methods. (Not more than 5 visuals)

  • With reference to the Spatial Interaction Model output tables, maps and data visualisation prepared, describe the modelling results. (not more than 100 words per visual).

5 Setting the Analytical Tools

Before I get started, I need to ensure that sf, spdep, tmap, tidyverse, and knitr packages of R are currently installed in my R.

  • sf : for importing and handling geospatial data in R,

  • spdep : for computing spatial weights, global and local spatial autocorrelation statistics, and

  • tmap : for preparing cartographic quality chropleth map

  • tidyverse : for wrangling attribute data in R ; tidyverse has already included collection of packages such as readr, ggplot2, dplyr, tiblle, purr, etc.

  • knitr: for facilitating dynamic report generation in R Markdown documents.

  • sp:???

  • reshape2

  • stplanr

  • httr

The code chunk below is used to ensure that the necessary R packages have been installed , if it is yet to be installed, it will then be installed and ready to be used in the R environment.

Show the code
pacman::p_load(sf, spdep, tmap, tidyverse, knitr, sp, reshape2, stplanr, httr)

6 Working with Aspatial Data (Commuter Flow)

6.1 Importing OD Csv File into R Environment

Next, I will import Passenger Volume by Origin Destination Bus Stops data set: origin_destination_bus_202310.csv downloaded from LTA Datamall for October 2023 into R by using st_read() of readr package. The output is R dataframe class.

Show the code
odbus <- read_csv("data/aspatial/origin_destination_bus_202310.csv")

The code chunk below uses glimpse() of dplyr package to display the odbus tibble data tables.

Show the code
glimpse(odbus)
Rows: 5,694,297
Columns: 7
$ YEAR_MONTH          <chr> "2023-10", "2023-10", "2023-10", "2023-10", "2023-…
$ DAY_TYPE            <chr> "WEEKENDS/HOLIDAY", "WEEKDAY", "WEEKENDS/HOLIDAY",…
$ TIME_PER_HOUR       <dbl> 16, 16, 14, 14, 17, 17, 17, 7, 14, 14, 10, 20, 20,…
$ PT_TYPE             <chr> "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "…
$ ORIGIN_PT_CODE      <chr> "04168", "04168", "80119", "80119", "44069", "2028…
$ DESTINATION_PT_CODE <chr> "10051", "10051", "90079", "90079", "17229", "2014…
$ TOTAL_TRIPS         <dbl> 3, 5, 3, 5, 4, 1, 24, 2, 1, 7, 3, 2, 5, 1, 1, 1, 1…

From the glimpse() check above, it is shown that the ORIGIN_PT_CODE and DESTINATION_PT_CODE are in character type. Both of them need to be converted to factor type to work with categorical variables so that I can use them to georeference with bus stop location data.

Code
odbus$ORIGIN_PT_CODE <- as.factor(odbus$ORIGIN_PT_CODE)
odbus$DESTINATION_PT_CODE <- as.factor(odbus$DESTINATION_PT_CODE) 

Next, I will confirm the data type changes using glimpse().

Show the code
glimpse(odbus)
Rows: 5,694,297
Columns: 7
$ YEAR_MONTH          <chr> "2023-10", "2023-10", "2023-10", "2023-10", "2023-…
$ DAY_TYPE            <chr> "WEEKENDS/HOLIDAY", "WEEKDAY", "WEEKENDS/HOLIDAY",…
$ TIME_PER_HOUR       <dbl> 16, 16, 14, 14, 17, 17, 17, 7, 14, 14, 10, 20, 20,…
$ PT_TYPE             <chr> "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "…
$ ORIGIN_PT_CODE      <fct> 04168, 04168, 80119, 80119, 44069, 20281, 20281, 1…
$ DESTINATION_PT_CODE <fct> 10051, 10051, 90079, 90079, 17229, 20141, 20141, 1…
$ TOTAL_TRIPS         <dbl> 3, 5, 3, 5, 4, 1, 24, 2, 1, 7, 3, 2, 5, 1, 1, 1, 1…

6.2 Checking for Duplicate Commuter Flow

Code
duplicate_rec <- odbus %>%   
  group_by_all() %>%   
  filter(n()>1) %>%   
  ungroup()

There is no duplicate record found on odbus dataset.

6.3 Extracting the Study Data

For the purpose of this take-home exercise, I will extract commuting flows on Weekend/holiday morning peak which is between 11am to 2pm.

Code
odbus11_14 <- odbus %>%      
  filter(DAY_TYPE == "WEEKENDS/HOLIDAY") %>%      
  filter(TIME_PER_HOUR >= 11 & TIME_PER_HOUR <= 14) %>%
  group_by(ORIGIN_PT_CODE,DESTINATION_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))
glimpse(odbus11_14)
Rows: 222,154
Columns: 3
Groups: ORIGIN_PT_CODE [5,012]
$ ORIGIN_PT_CODE      <fct> 01012, 01012, 01012, 01012, 01012, 01012, 01012, 0…
$ DESTINATION_PT_CODE <fct> 01112, 01113, 01121, 01211, 01311, 01549, 01559, 0…
$ TRIPS               <dbl> 265, 189, 120, 141, 218, 1, 7, 16, 8, 8, 58, 21, 7…

7 Working with Geospatial Data

7.1 Importing Shapefile into R Environment

  • Bus Stop Location

The code chunk below uses st_read() of sf package to import BusStop shapefile into R. The imported shapefile will be simple features Object of sf. Then, I use st_transform of sp package to convert coordinates to EPSG code of 3414 for SVY21 (projected coordinate system for Singapore).

Show the code
busstop <- st_read(dsn = "data/geospatial", layer = "BusStop") %>%
  st_transform(crs = 3414)
Reading layer `BusStop' from data source 
  `W:\widyayutika\ISSS624\Take-home_Exercise\Take-home_Ex2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 5161 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21

7.1.1 Checking Duplicate Bus Stop

Code
duplicate_bus_stop <- busstop %>%   
  group_by(BUS_STOP_N) %>%   
  filter(n()>1)

There are 16 duplicated records with same BUS_STOP_N but different geometry points. However, upon investigation, the coordinates are quite near to each other, this lead to the possibility of temporary bus stop. In view of this, I will remove the duplicated records and only keep the first occurrence using the code chunk below

Code
busstop <- busstop %>%   
  distinct(BUS_STOP_N, .keep_all =TRUE)
  • Master Plan 2019 Subzone Boundary

The code chunk below uses st_read() of sf package to import MPSZ-19 shapefile into R. The imported shapefile will be simple features Object of sf. Then, I use st_transform of sp package to convert coordinates to EPSG code of 3414 for SVY21 (projected coordinate system for Singapore).

Show the code
mpsz <- st_read(dsn = "data/geospatial", layer = "MPSZ-2019") %>%
  st_transform(crs = 3414)
Reading layer `MPSZ-2019' from data source 
  `W:\widyayutika\ISSS624\Take-home_Exercise\Take-home_Ex2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS:  WGS 84

7.2 Creating Honeycomb/Hexagon Layer

Honeycomb layer are preferred to replace coarse and irregular Master Plan 2019 Sub-zone GIS data set of URA because hexagon reduce sampling bias due to its grid shape of low perimeter to are ratio and its ability to form evenly spaced grid. Honeycomb grids are well-suited for approximating circular areas, making them suitable for mapping Singapore edges with irregular shape.

The code chunk below uses st_make_grid of sf package to create a hexagonal or honeycomb grid with a 375m (perpendicular distance between the center of hexagon and its edges) to represent the traffic analysis zone (TAZ). According the the R documentation, the cellsize is the distance between opposite edges, which is 2 times the perpendicular distance between the center of hexagon and its edges. Thus, for the purpose of this exercise, I will use cellsize of 750m and indicate the square=FALSE for hexagonal grid. After doing do, I will create a grid_id for each hexagonal grid (2,541 hexagonal grid).

Show the code
area_honeycomb_grid = st_make_grid(busstop, cellsize=750, what = "polygons", square = FALSE)      
# To sf and add grid ID    
honeycomb_grid_sf = st_sf(area_honeycomb_grid) %>%
  # add grid ID            
  mutate(grid_id = 1:length(lengths(area_honeycomb_grid)))

Next, I will only retain the hexagons with at least 1 bus stop in it.

honeycomb_grid_sf$busstop_count = lengths(st_intersects(honeycomb_grid_sf, busstop))


honeycomb_with_busstop = filter(honeycomb_grid_sf, busstop_count > 0)

There are 834 hexagonal units with at least 1 busstop.

plot(honeycomb_with_busstop['busstop_count'], 
     main = 'Bus Stop Count in Each Hexagonal Grid')

Some hexagonal grids are highly packed with a maximum of 19 bus stops shown in yellow color above.

7.3 Data Wrangling with Geospatial Data

In this section, I uses st_intersection() from sp package to overlap the bus stop points and hexagonal grids.

busstop_honeycomb <- st_intersection(busstop, honeycomb_with_busstop) %>%
  select(BUS_STOP_N, LOC_DESC, grid_id) %>%
  st_drop_geometry()

8 Build an OD Matrix of Commuter Flows for Weekends/Holiday Morning Peak Hour

In this section, I will construct a matrix for Origin and Destination with the Trips counts on each combinations.

First, I will append “ORIGIN_GRID_ID” and “ORIGIN_LOC_DESC” from busstop_honeycomb data frame onto odbus11_14 data frame using the code chunk below.

od_data <- left_join(odbus11_14 , busstop_honeycomb,
            by = c("ORIGIN_PT_CODE" = "BUS_STOP_N")) %>%
  rename(ORIGIN_BS = ORIGIN_PT_CODE,
         ORIGIN_GRID_ID = grid_id,
         DESTIN_BS = DESTINATION_PT_CODE,
         ORIGIN_LOC_DESC = LOC_DESC)

Next, I will check for duplicated records on od_data using the chunk code below.

Code
duplicate_od_data <- od_data %>%   
  group_by_all() %>%   
  filter(n()>1) %>%   
  ungroup()

The above code chunk shows that there are no duplicate record found.

Next, I will update od_data data frame by performing another left join with busstop_honeycomb to get the “DESTIN_GRID_ID” and “DESTIN_LOC_DESC”.

od_data <- left_join(od_data , busstop_honeycomb,
            by = c("DESTIN_BS" = "BUS_STOP_N")) %>%
  rename(DESTIN_GRID_ID = grid_id,
         DESTIN_LOC_DESC = LOC_DESC)
missing_values <- colSums(is.na(od_data))

# Print the columns with missing values
print(missing_values)
      ORIGIN_BS       DESTIN_BS           TRIPS ORIGIN_LOC_DESC  ORIGIN_GRID_ID 
              0               0               0            2720            2199 
DESTIN_LOC_DESC  DESTIN_GRID_ID 
           2849            2257 

The missing grid_id for origin and destination bus stop may be due to the outdated Bus Stop Location Data which is uploaded on July 2023 as compared to the Passenger Volume by Origin Destination Bus Stops data collected in October 2023.

So, I will remove the rows with NA and calculate the number of trips on weekends/holiday morning peak hour by using the group_by “ORIGIN_GRID_ID” and “DESTIN_GRID_ID” and sum all the trips between each combination of hexagonal grid ids using the code chunk below.

od_data <- od_data %>%
  drop_na()
od_data1 <- od_data %>%
  group_by(ORIGIN_GRID_ID, DESTIN_GRID_ID) %>%
  summarise(MORNING_PEAK = sum(TRIPS),
            ORIGIN_DESC = paste(unique(ORIGIN_LOC_DESC), collapse = ', '),
            DESTIN_DESC = paste(unique(DESTIN_LOC_DESC), collapse = ', '))

I will check the distribution of the Morning Peak trips using summary as shown below.

summary(od_data1$MORNING_PEAK)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      1       4      14     120      58   43418 

I will save the output into an rds file format.

write_rds(od_data, "data/rds/od_data.rds")
od_data <- read_rds("data/rds/od_data.rds")

9 Computing Distance Matrix by Analytics Hexagon Level

First as.Spatial() from sp package will be used to convert honeycomb_with_busstop from sf tibble data frame to SpatialPolygonsDataFrame of sp object as shown in the code chunk below. This method is used because it is faster to compute distance matrix using sp than sf package.

honeycomb_with_busstop_sp <- as(honeycomb_with_busstop, "Spatial") 
honeycomb_with_busstop_sp
class       : SpatialPolygonsDataFrame 
features    : 834 
extent      : 3595.122, 48595.12, 26049.09, 53545.39  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
variables   : 2
names       : grid_id, busstop_count 
min values  :      23,             1 
max values  :    2505,            19 

Next, spDists() of sp package will be used to compute the Euclidean distance between the centroids of the hexagons. When longlat is set to FALSE, spDists will return a full matrix of distances in the metric of points. While longlat is set to TRUE, it will return in kilometers. In my case, since there are 834 hexagon, amtrix of 834 x 834 will be created and I will print out the first 8 rows using head().

dist <- spDists(honeycomb_with_busstop_sp, 
                longlat = FALSE)
head(dist, n=c(8, 8))
         [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]     [,8]
[1,]    0.000  750.000 3269.174 1500.000 2704.163 3968.627 1299.038 2250.000
[2,]  750.000    0.000 2598.076  750.000 1984.313 3269.174  750.000 1500.000
[3,] 3269.174 2598.076    0.000 1984.313  750.000  750.000 2704.163 1500.000
[4,] 1500.000  750.000 1984.313    0.000 1299.038 2598.076  750.000  750.000
[5,] 2704.163 1984.313  750.000 1299.038    0.000 1299.038 1984.313  750.000
[6,] 3968.627 3269.174  750.000 2598.076 1299.038    0.000 3269.174 1984.313
[7,] 1299.038  750.000 2704.163  750.000 1984.313 3269.174    0.000 1299.038
[8,] 2250.000 1500.000 1500.000  750.000  750.000 1984.313 1299.038    0.000

From the matrix above, it is noticed that both the column and row headers are not labelled by grid_id. So, I label the headers by first creating a list sorted according to the distance matrix by grid_id and followed by attaching the grid_id to row and column for distance matrix matching.

grid_id_desc <- honeycomb_with_busstop$grid_id

colnames(dist) <- paste0(grid_id_desc)
rownames(dist) <- paste0(grid_id_desc)
head(dist, n=c(8,8))
         23       44       46       66       67       68       86       87
23    0.000  750.000 3269.174 1500.000 2704.163 3968.627 1299.038 2250.000
44  750.000    0.000 2598.076  750.000 1984.313 3269.174  750.000 1500.000
46 3269.174 2598.076    0.000 1984.313  750.000  750.000 2704.163 1500.000
66 1500.000  750.000 1984.313    0.000 1299.038 2598.076  750.000  750.000
67 2704.163 1984.313  750.000 1299.038    0.000 1299.038 1984.313  750.000
68 3968.627 3269.174  750.000 2598.076 1299.038    0.000 3269.174 1984.313
86 1299.038  750.000 2704.163  750.000 1984.313 3269.174    0.000 1299.038
87 2250.000 1500.000 1500.000  750.000  750.000 1984.313 1299.038    0.000

Next, I will pivot the distance matrix into a long table by using the row and column grid_id using melt() of reshape2 package to convert wide-format data to long-format data.

For reference : https://www.statology.org/long-vs-wide-data/

Wide Long Format
distPair <- melt(dist) %>%
  rename(dist = value)
head(distPair, 10)
   Var1 Var2     dist
1    23   23    0.000
2    44   23  750.000
3    46   23 3269.174
4    66   23 1500.000
5    67   23 2704.163
6    68   23 3968.627
7    86   23 1299.038
8    87   23 2250.000
9    88   23 3436.932
10   89   23 4683.748

There are 695556 rows of distPair as the number of possible pair of hexagon permutations.

summary(distPair$dist)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      0    8250   13332   14145   18929   44680 

From the summary of “dist” above, it is noticed that there are 0 distance, this refer to intra-zonal distance meaning that the trips are taken from same grid id. Since my minimum inter-zonal distance is 750m, it will set my intra-zonal distance to value below 750/2 = 375m. So, I choose a value of 350m for intra-zonal distance.

distPair %>%
  filter(dist > 0) %>%
  summary()
      Var1           Var2           dist      
 Min.   :  23   Min.   :  23   Min.   :  750  
 1st Qu.: 871   1st Qu.: 871   1st Qu.: 8352  
 Median :1324   Median :1324   Median :13332  
 Mean   :1269   Mean   :1269   Mean   :14162  
 3rd Qu.:1688   3rd Qu.:1688   3rd Qu.:18929  
 Max.   :2505   Max.   :2505   Max.   :44680  
distPair$dist <- ifelse(distPair$dist == 0, 350, distPair$dist)

Next, I will rename var1 and var 2 into origin and destination grid_id.

distPair <- distPair %>%
  rename(ORIGIN_GRID_ID = Var1,
         DESTIN_GRID_ID = Var2)
summary(distPair$dist)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    350    8250   13332   14145   18929   44680 

10 Preparing Flow Data

In this section, I will rename my od_data1 to flow_data.

flow_data <- od_data1

print(head(flow_data))
# A tibble: 6 × 5
# Groups:   ORIGIN_GRID_ID [2]
  ORIGIN_GRID_ID DESTIN_GRID_ID MORNING_PEAK ORIGIN_DESC            DESTIN_DESC 
           <int>          <int>        <dbl> <chr>                  <chr>       
1             44            128            4 BEF TUAS STH AVE 14    AFT TUAS ST…
2             44            129            2 BEF TUAS STH AVE 14    BEF TUAS ST…
3             44            175           20 BEF TUAS STH AVE 14    TUAS WEST R…
4             46             67            2 SEE HUP SENG           BEF TUAS ST…
5             46            111            6 YONG NAM               AFT TUAS ST…
6             46            134           73 SEE HUP SENG, YONG NAM TUAS LK STN 

10.1 Separating Intra-Flow from flow_data Data Frame

The code chunk below is used to add two new fields in flow_data dataframe.

  • ‘FlowNoIntra’ : 0 for intra-zonal flow and MORNING_PEAK for inter-zonal flow

  • ‘offset’ : a very small number (i.e. 0.000001) for intra-zonal flow and 1 for inter-zonal flow

flow_data$FlowNoIntra <- ifelse(
  flow_data$ORIGIN_GRID_ID == flow_data$DESTIN_GRID_ID, 
  0, flow_data$MORNING_PEAK)
flow_data$offset <- ifelse(
  flow_data$ORIGIN_GRID_ID == flow_data$DESTIN_GRID_ID, 
  0.000001, 1)

I will create 2 separate data frames for intra and inter zonal flow.

intra_zonal_flow <- flow_data %>% 
  filter(FlowNoIntra ==0)

inter_zonal_flow <- flow_data %>% 
  filter(FlowNoIntra >0)

From a total of 62176 flow data, there are 568 intra-zonal flow and 61,608 inter-zonal flow.

intra_zonal_flow
# A tibble: 568 × 7
# Groups:   ORIGIN_GRID_ID [568]
   ORIGIN_GRID_ID DESTIN_GRID_ID MORNING_PEAK ORIGIN_DESC            DESTIN_DESC
            <int>          <int>        <dbl> <chr>                  <chr>      
 1            111            111            1 "TUAS TECHNOLOGY PK"   AFT TUAS S…
 2            133            133            4 "BEF TUAS WEST AVE, A… AFT PIONEE…
 3            150            150            1 "OPP SLS"              BEF TUAS S…
 4            154            154            9 "AFT TUAS WEST RD STN" ROTARY ENG…
 5            175            175            8 "OPP NISSIN TPT, OPP … TUAS WEST …
 6            176            176           13 "TUAS TER, BEF TUAS W… OPP TUAS L…
 7            216            216           19 "TUAS CRES STN EXIT B… TUAS CRES …
 8            217            217           13 "AFT TUAS AVE 7, SMC … OPP SMC MF…
 9            237            237            1 "BEF TUAS AMENITY CTR" TRU-MARINE 
10            278            278            1 "KEPPEL SHIPYARD"      DUNDEE MAR…
# ℹ 558 more rows
# ℹ 2 more variables: FlowNoIntra <dbl>, offset <dbl>

10.2 Combining Passenger Volume Data (Inter-Zonal Flow Data) with Distance Value

Before we can join “inter_zonal_flow” and “distPair”, I will convert data value type of ORIGIN_GRID_ID and DESTIN_GRID_ID fields of both dataframe into factor data type.

inter_zonal_flow$ORIGIN_GRID_ID <- as.factor(inter_zonal_flow$ORIGIN_GRID_ID) 
inter_zonal_flow$DESTIN_GRID_ID  <- as.factor(inter_zonal_flow$DESTIN_GRID_ID)
distPair$ORIGIN_GRID_ID  <- as.factor(distPair$ORIGIN_GRID_ID)
distPair$DESTIN_GRID_ID  <- as.factor(distPair$DESTIN_GRID_ID )

Now, left_join() of dplyr will be used to combine “inter_zonal_flow” dataframe and distPair dataframe. The output is called flow_data1.

flow_data1 <- inter_zonal_flow %>%
  left_join (distPair,
             by = c("ORIGIN_GRID_ID" = "ORIGIN_GRID_ID",
                    "DESTIN_GRID_ID" = "DESTIN_GRID_ID"))
print(head(flow_data1))
# A tibble: 6 × 8
# Groups:   ORIGIN_GRID_ID [2]
  ORIGIN_GRID_ID DESTIN_GRID_ID MORNING_PEAK ORIGIN_DESC DESTIN_DESC FlowNoIntra
  <fct>          <fct>                 <dbl> <chr>       <chr>             <dbl>
1 44             128                       4 BEF TUAS S… AFT TUAS S…           4
2 44             129                       2 BEF TUAS S… BEF TUAS S…           2
3 44             175                      20 BEF TUAS S… TUAS WEST …          20
4 46             67                        2 SEE HUP SE… BEF TUAS S…           2
5 46             111                       6 YONG NAM    AFT TUAS S…           6
6 46             134                      73 SEE HUP SE… TUAS LK STN          73
# ℹ 2 more variables: offset <dbl>, dist <dbl>

11 Visualising O-D Flows of Weekends/Holiday Morning Peak Hour

In this section, I will create a desire line of inter-zonal flow (flow_data1) by using od2line of stplanr package.

flowLine <- od2line(flow = flow_data1, zones = honeycomb_with_busstop, zone_code = "grid_id") 

To visualise the resulting desire lines, the code chunk below is used. In the map below, I have filtered out the trips of less than 5000 for ease of analysis. Thicker line width refers to the flow with more trips while the length of the desire lines refers to the distance of each inter-zonal flow.

tmap_options(check.and.fix = TRUE) 
tmap_mode("plot") 
tm_shape(mpsz)+   
  tm_polygons(alpha=0.5)+
  tm_text(text = "SUBZONE_N", size = 0.2) +
tm_shape(honeycomb_with_busstop) +
  tm_polygons(col = "lightblue", alpha=0.5) +    
  flowLine %>%   filter(MORNING_PEAK >= 5000) %>%   
  tm_shape() +
  tm_lines(lwd = "MORNING_PEAK",                        
           style = "quantile", 
           scale = c(0.5, 2, 4, 6, 8, 10, 12),                        
           n = 6,                        
           alpha = 0.3,
           col="red")+   
  tm_layout(main.title = 'Static Map: OD Flow On Weekends/Holiday Morning Peak Hour' ,
            main.title.position = "center",
            main.title.size = 1.0,
            legend.width=1)

From the map above, I observed a noticeable concentration of public bus flow in the Woodlands Area and Johor Bahru (hexagon outside of Singapore boundary) as indicated by a thick bank extending between the two for weekends/holiday morning peak hour. We will further analyse whether the flow is from Singapore or from Johor Bahru.

flowLine %>% filter(MORNING_PEAK >= 30000)
Simple feature collection with 2 features and 8 fields
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: 20470.12 ymin: 47266.71 xmax: 20845.12 ymax: 49215.27
Projected CRS: SVY21 / Singapore TM
  ORIGIN_GRID_ID DESTIN_GRID_ID MORNING_PEAK         ORIGIN_DESC
1            962            984        43418     W'LANDS CHECKPT
2            984            983        31087 JOHOR BAHRU CHECKPT
          DESTIN_DESC FlowNoIntra offset     dist
1 JOHOR BAHRU CHECKPT       43418      1 1984.313
2     W'LANDS CHECKPT       31087      1 1299.038
                        geometry
1 LINESTRING (20470.12 47266....
2 LINESTRING (20845.12 49215....

The public bus flow suggest that the predominant movement is from Woodlands Checkpoint to Johor Bahru Checkpoint with around 43k trips. The flow from Johor Bahru Checkpoint to Singapore is also comparably high with around 31k trips. This public bus commuter flow could be attributed to individuals engaging in leisure activities, entertainment, tourism, family/friend visit in Johor/Singapore, some even take advantage of cost-effective grocery shopping in Johor.

summary((flowLine %>% filter(MORNING_PEAK >= 5000))$dist)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    750     750     750    1056    1299    2704 

Furthermore, I observed that during weekend morning, the predominant public bus flow spans a relatively short distance with a maximum of 2.7 kilometers. This phenomenon may be attributed to the fact that weekends are typically considered rest days, leading to shorter commutes or more localized activities within a confined radius. Alternatively, individuals may opt for different modes of transport like MRT longer transfers. In the view of activities in confined redius, I will also analyse intra-zonal flows.

intra_flow_map <- honeycomb_with_busstop %>%
  inner_join(intra_zonal_flow, by = c("grid_id" = "ORIGIN_GRID_ID")) %>%
  rename("LOC_DESC"="ORIGIN_DESC") %>%
  select("grid_id","LOC_DESC","MORNING_PEAK","busstop_count","area_honeycomb_grid")

intra_flow_map <- intra_flow_map %>%
  mutate(short_loc_desc = str_extract(LOC_DESC, "^[^,]+"))
tmap_options(check.and.fix = TRUE) 
tmap_mode("view") 
tm_shape(mpsz)+   
  tm_polygons(alpha=0.5)+ 
tm_shape(intra_flow_map) +
  tm_borders() +
  tm_fill("MORNING_PEAK", palette = "Blues",
          title = "Morning Peak",
          breaks= c(1,1000,5000,8000,11000,14500),
          legend.show = TRUE,
          popup.vars = c("Location Description" ="short_loc_desc",
                         "Trips Count"="MORNING_PEAK",
                         "Bus Stop Count"="busstop_count"
                         )) +
  tm_view(set.zoom.limits =c(11,17))+
  tm_layout(main.title = 'Intra Flow On Weekends/Holiday Morning Peak Hour' ,
            main.title.position = "center",
            main.title.size = 1.0,
            main.title.fontface = 'bold',
            legend.width=1)
tmap_mode("plot") 

From the map above, the intra public bus flow occurs in the vicinity of Admiralty Int, BLK 126 CP (Bukit Merah and Tiong Bahru Area). This suggested that there is a significant volume occuring within a short distance in the specified areas.

The below code chunk is for the interactive maps to get more insights.

tmap_options(check.and.fix = TRUE) 
tmap_mode("view") 
tm_shape(mpsz)+   
  tm_polygons(alpha=0.5)+
tm_shape(honeycomb_with_busstop) +
  tm_polygons(col = "lightblue", alpha=0.5) +    
  flowLine %>%   filter(MORNING_PEAK >= 5000) %>%   
  tm_shape() +
  tm_lines(lwd = "MORNING_PEAK",                        
           style = "quantile", 
           scale = c(2, 6, 8, 10, 8, 12, 14),                        
           n = 6,                        
           alpha = 0.3,
           col="red",
           popup.vars=c("Origin Description"="ORIGIN_DESC",
                        "Destination Description" ="DESTIN_DESC",
                        "Trips Count" ="MORNING_PEAK"))+   
  tm_view(set.zoom.limits =c(11,17))+
  tm_layout(main.title = 'Static Map: OD Flow On Weekends/Holiday Morning Peak Hour' ,
            main.title.position = "center",
            main.title.size = 1.0,
            main.title.fontface = 'bold')

From the map above, I also noticed that there are some great volume to hexagonal grid with bus interchanges or MRt/LRT stations such as Boonlay, Choa Chu Kang, Bukit Panjang, Yee Tew, Clementi, Bishan, Ang Mo Kio, Serangoon, Toa Payoh, Sengkang, Bedok, Tampines, Pasir Ris, which can be one of the factor to be considered for my Spatial Interaction Model later on.

12 Assemble Propulsive(Origin) and Attractiveness(Destination) Variables for Spatial Interaction Model

12.1 Aspatial Data 1 : School Location

First, I will import the School General Information in csv format which is downloaded from data.gov.sg for School Directory and Information. This dataset consist of school name and postal code for MOE Kindergartens, Primary Schools, Secondary Schools, Junior College, Mixed Levels and Centralized Institute for pre-university. In total, there are 346 records. Since the data set does not include polytechnics and ITE, I will collate the information from moe.gov.sg. Currently, there are 8 polytechnics/ITE. Next, I will merge the 2 dasasets and so in total there are 354 schools.

school <- read_csv("data/aspatial/Generalinformationofschools.csv")
poly_ite <- read_csv("data/aspatial/poly_ite.csv")

poly_ite <- poly_ite %>% 
  rename(school_name = poly_ite_name) %>% 
  mutate(postal_code = as.character(postal_code))

merged_school <- bind_rows(school, poly_ite)

Next, I will use OneMap API to geocode the school locations by retriving the longitude and latitude coordinates using the postal_code field.

#|eval: false
url <- "https://www.onemap.gov.sg/api/common/elastic/search"

postcodes <- merged_school$'postal_code'
found <- data.frame()
not_found <- data.frame()

for (postcode in postcodes){
  query <- list('searchVal'=postcode,'returnGeom'='Y','getAddrDetails'='Y','pageNum'='1')
  res<-GET(url, query=query)
  
  if((content(res)$found)!=0){
    found <-rbind(found, data.frame(content(res))[4:13])
  } else{
    not_found = data.frame(postcode)
  }
}

merged = merge(merged_school, found, by.x='postal_code', by.y='results.POSTAL', all=TRUE)

It is noticed that the number of rows are increasing, there may be duplicated rows and upon checking, there are 8 duplicated rows so I will remove the duplicated rows.

duplicate_school <- merged %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()
merged <- merged %>%
  distinct(across(everything()), .keep_all = TRUE)

Then , I will check wehther there is any postal code that cannot be geocode using the code chunk below.

not_found
  postcode
1   677741

There is 1 postal code that cannot be geocode which is 677741. In this case, I will manually extract the longitude (103.7651) and latitude(1.389279) for this specific postal code and fill the NA values with the code chunk below

merged <- merged %>%
  mutate(results.LATITUDE = ifelse(postal_code == "677741", 1.389279, results.LATITUDE)) %>%
  mutate(results.LONGITUDE = ifelse(postal_code == "677741", 103.7651, results.LONGITUDE))

I will then write the merged file into a csv file.

write.csv (merged, file='data/aspatial/schools.csv')
write.csv (not_found, file ='data/aspatial/not_found.csv')

Next, I will import the schools final data set and tidy it up (i.e. renaming the column headers) and only extract necessary fields using the code chunk below.

schools <- read_csv("data/aspatial/schools.csv") %>%
  rename(latitude='results.LATITUDE', longitude='results.LONGITUDE')%>%
  select(postal_code, school_name, latitude,longitude)

Then, I will convert the school coordinates into Singapore Projected Coordinate System SVY21 after converting to sf object.

schools_sf <- st_as_sf(schools, 
                       coords=c('longitude', 'latitude'),
                       crs=4326) %>%
  st_transform(crs=3414)

After that, I will count the number of schools in each heaxgon using the st_intersects() of sp package.

honeycomb_with_busstop$'SCHOOL_COUNT' <- lengths(st_intersects(honeycomb_with_busstop, schools_sf))

Next, I will check the school_count distributions using summary().

summary(honeycomb_with_busstop$SCHOOL_COUNT)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.0000  0.0000  0.0000  0.4185  1.0000  4.0000 
tmap_options(check.and.fix = TRUE) 
tmap_mode("view") 
tm_shape(mpsz)+   
  tm_polygons(alpha=0.3)+
tm_shape(honeycomb_with_busstop%>% filter(SCHOOL_COUNT>0)) +
  tm_borders() +
  tm_fill("SCHOOL_COUNT", palette = "Blues",
          title = "School Count",
          breaks= c(1,2,3,4,4),
          legend.show = TRUE,
          alpha=0.7) +
  tm_view(set.zoom.limits =c(11,14))+
  tm_layout(main.title = 'SCHOOL COUNT on Hexagonal Grid' ,
            main.title.position = "center",
            main.title.size = 1.0,
            main.title.fontface = 'bold',
            legend.width=1)
tmap_mode("plot") 

From the above map, it is noticeable that schools are spread across Singapore and some area like Punggol, Sengkang have more schools as compared to other areas and the CBD and Tuas area has fewer schools.

12.2 Aspatial Data 2 : HDB Location

I will import the hdb in csv format provided and collated by Prof. Kam. This data set is the geocoded version of HDB Property Information data from Data.gov.sg. The data set is prepared using September 2021 data, consisting of 12,442 records with 37 columns.

hdb <- read_csv("data/aspatial/hdb.csv")

Next, I will convert the longitude and latitude from WSG84 to Singapore Projected Coordinate System SVY21 using st_transform().

hdb_sf <- st_as_sf(hdb,                         
                       coords=c('lng', 'lat'),                        
                       crs=4326) %>%   
  st_transform(crs=3414)

Next, I will check for duplicated hdb.

duplicate_hdb <- hdb_sf %>%   
  group_by(blk_no,street) %>%   
  filter(n()>1) %>%   
  ungroup()

There are no duplicated hdb, so next, I will count the dwelling units in each hexagonal grid.

dwelling_units_count <- st_intersection(honeycomb_with_busstop, hdb_sf) %>%
  group_by(grid_id) %>%
  summarise(HDB_DWELLING_UNIT=sum(total_dwelling_units)) %>%
  st_drop_geometry()
honeycomb_with_busstop = left_join(honeycomb_with_busstop, dwelling_units_count)

Next, I will check the distribution of dwelling units in Singapore.

summary(honeycomb_with_busstop$HDB_DWELLING_UNIT)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
      0     955    2364    2727    4077    7946     436 

Next, I will NA with 0, meaning there is the hexagonal grid is not a residential area.

honeycomb_with_busstop <- honeycomb_with_busstop %>%
  mutate(HDB_DWELLING_UNIT = replace_na(HDB_DWELLING_UNIT, 0))
tmap_options(check.and.fix = TRUE) 
tmap_mode("view") 
tm_shape(mpsz)+   
  tm_polygons(alpha=0.3)+
tm_shape(honeycomb_with_busstop%>% filter(HDB_DWELLING_UNIT>0)) +
  tm_borders() +
  tm_fill("HDB_DWELLING_UNIT", palette = "Blues",
          title = "HDB Dwelling Units Count",
          breaks= c(1,1000,2000,4000,6000,7946),
          legend.show = TRUE,
          alpha=0.7) +
  tm_view(set.zoom.limits =c(11,14))+
  tm_layout(main.title = 'HDB Dwelling Units on Hexagonal Grid' ,
            main.title.position = "center",
            main.title.size = 1.0,
            main.title.fontface = 'bold',
            legend.width=1)
tmap_mode("plot") 

From the map above, the HDB spread across Singapore with high concentration on Punggol, Sengkang, Tampine, Bedok, Woodlands, Yishun, Choa Chu Kang, Jurong West.

12.3 Geospatial Data 1 : Business

This business data is curated by Prof. Kam, consisting locations of business establishments in Singapore. It consists of 6,550 point coordinates of business location in Singapore. The shapefile is already in SVY21 form so there is no need to do st_transform().

business_sf <- st_read(dsn="data/geospatial", layer="Business")
Reading layer `Business' from data source 
  `W:\widyayutika\ISSS624\Take-home_Exercise\Take-home_Ex2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 6550 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3669.148 ymin: 25408.41 xmax: 47034.83 ymax: 50148.54
Projected CRS: SVY21 / Singapore TM

Next, I will check for duplicated business.

Code
duplicate_business <- business_sf %>%   
  group_by_all() %>%   
  filter(n()>1) %>%   
  ungroup()

Apparently there are 2 duplicate businesses, so I will remove the duplicated records and keep the first occurrence.

business_sf<- unique(business_sf)

Next, I will count the number of businesses in each hexagonal grid.

honeycomb_with_busstop$'BUSINESS_COUNT' <- lengths(st_intersects(honeycomb_with_busstop, business_sf))

Next, I will check the distribution of BUSINESS_COUNT.

summary(honeycomb_with_busstop$'BUSINESS_COUNT')
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.000   0.000   1.000   7.272   7.000  97.000 
tmap_options(check.and.fix = TRUE) 
tmap_mode("view") 
tm_shape(mpsz)+   
  tm_polygons(alpha=0.3)+
tm_shape(honeycomb_with_busstop%>% filter(BUSINESS_COUNT>0)) +
  tm_borders() +
  tm_fill("BUSINESS_COUNT", palette = "Blues",
          title = "Business Count",
          breaks= c(1,10,20,30,50,97),
          legend.show = TRUE,
          alpha=0.7) +
  tm_view(set.zoom.limits =c(11,14))+
  tm_layout(main.title = 'BUSINESS COUNT on Hexagonal Grid' ,
            main.title.position = "center",
            main.title.size = 1.0,
            main.title.fontface = 'bold',
            legend.width=1)
tmap_mode("plot") 

From the map above, for business, it is observed that certain regions especially industrial area like Jurong West to Tuas Area, Woodlands Industrial Area, Kaki Bukit Industrial Estate, Loyang Industrial Estate, exhibits a higher concentration of businesses. The CBD/ Downtown area also has high number of businesses as it is the main commercial area in Singapore.

12.4 Geospatial Data 2 : Entertainment

This entertainment data is curated by Prof. Kam, consisting of 114 point coordinates of entertainment area like cinema, theatre, art centers in Singapore. The shapefile is already in SVY21 form so there is no need to do st_transform().

entertainment_sf <- st_read(dsn="data/geospatial", layer="entertn")
Reading layer `entertn' from data source 
  `W:\widyayutika\ISSS624\Take-home_Exercise\Take-home_Ex2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 114 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 10809.34 ymin: 26528.63 xmax: 41600.62 ymax: 46375.77
Projected CRS: SVY21 / Singapore TM

Next, I will check for duplicated entertainment.

Code
duplicate_entertn <- entertainment_sf %>%   
  group_by_all() %>%   
  filter(n()>1) %>%   
  ungroup()

Apparently there are 2 duplicate entertainments, so I will remove the duplicated records and keep the first occurrence.

Next, I will count the number of entertainments in each hexagonal grid.

honeycomb_with_busstop$'ENTERTAINMENT_COUNT' <- lengths(st_intersects(honeycomb_with_busstop, entertainment_sf))

Next, I will check the distribution of ENTERTAINMENT_COUNT.

summary(honeycomb_with_busstop$'ENTERTAINMENT_COUNT')
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.0000  0.0000  0.0000  0.1331  0.0000  9.0000 
tmap_options(check.and.fix = TRUE) 
tmap_mode("view") 
tm_shape(mpsz)+   
  tm_polygons(alpha=0.3)+
tm_shape(honeycomb_with_busstop%>% filter(ENTERTAINMENT_COUNT>0)) +
  tm_borders() +
  tm_fill("ENTERTAINMENT_COUNT", palette = "Blues",
          title = "Entertainment Count",
          breaks= c(1,3,5,7,9),
          legend.show = TRUE,
          alpha=0.7) +
  tm_view(set.zoom.limits =c(11,14))+
  tm_layout(main.title = 'ENTERTAINMENT COUNT on Hexagonal Grid' ,
            main.title.position = "center",
            main.title.size = 1.0,
            main.title.fontface = 'bold',
            legend.width=1)
tmap_mode("plot") 

From the map above, it is observed that CBD area especially city hall to bugis area exhibits a higher concentration of entertainment in Singapore.

12.5 Geospatial Data 3 : F&B

This F&B data is curated by Prof. Kam, consisting of 1,919 point coordinates of F&B area like cafe, restaurant, bar, club, karaoke, pub in Singapore. The shapefile is already in SVY21 form so there is no need to do st_transform().

fnb_sf <- st_read(dsn="data/geospatial", layer="F&B")
Reading layer `F&B' from data source 
  `W:\widyayutika\ISSS624\Take-home_Exercise\Take-home_Ex2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 1919 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 6010.495 ymin: 25343.27 xmax: 45462.43 ymax: 48796.21
Projected CRS: SVY21 / Singapore TM

Next, I will check for duplicated F&B.

Show the code
duplicate_fnb <- fnb_sf %>%      
  group_by_all() %>%      
  filter(n()>1) %>%      
  ungroup()

There are no duplicated F&B so next, I will count the number of F&B in each hexagonal grid.

honeycomb_with_busstop$'FNB_COUNT' <- lengths(st_intersects(honeycomb_with_busstop, fnb_sf))

Next, I will check the distribution of FNB_COUNT.

summary(honeycomb_with_busstop$'FNB_COUNT')
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.000   0.000   0.000   2.204   1.000 133.000 
tmap_options(check.and.fix = TRUE)  
tmap_mode("view")  
tm_shape(mpsz)+      
  tm_polygons(alpha=0.3)+ 
  tm_shape(honeycomb_with_busstop%>% 
             filter(FNB_COUNT>0)) +   
  tm_borders() +   
  tm_fill("FNB_COUNT", 
          palette = "Blues",           
          title = "FNB Count",           
          breaks= c(1,10,20,30,50,70,100,133),           
          legend.show = TRUE,           
          alpha=0.7) +   
  tm_view(set.zoom.limits =c(11,14))+   
  tm_layout(main.title = 'FNB COUNT on Hexagonal Grid' ,             
            main.title.position = "center",             
            main.title.size = 1.0,             
            main.title.fontface = 'bold',             
            legend.width=1) 
tmap_mode("plot") 

From the map above, similar to entertainment, it is observed that CBD area especially tanjong pagar to bugis area exhibits a higher concentration of F&Bin Singapore.

12.6 Geospatial Data 4 : Financial Service

This Financial Service data is curated by Prof. Kam, consisting ]of 3,320 point coordinates of financial service area like bank, money changer in Singapore. The shapefile is already in SVY21 form so there is no need to do st_transform().

finserv_sf <- st_read(dsn="data/geospatial", layer="FinServ")
Reading layer `FinServ' from data source 
  `W:\widyayutika\ISSS624\Take-home_Exercise\Take-home_Ex2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 3320 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 4881.527 ymin: 25171.88 xmax: 46526.16 ymax: 49338.02
Projected CRS: SVY21 / Singapore TM

Next, I will check for duplicated financial services.

Show the code
duplicate_finserv <- finserv_sf %>%         
  group_by_all() %>%         
  filter(n()>1) %>%         
  ungroup()

Apparently there are 524 duplicate financial services, so I will remove the duplicated records and keep the first occurrence. So, in total there are 3,058 financial services.

finserv_sf<- unique(finserv_sf)

Next, I will count the number of financial services in each hexagonal grid.

honeycomb_with_busstop$'FINSERV_COUNT' <- lengths(st_intersects(honeycomb_with_busstop, finserv_sf))

Next, I will check the distribution of FINSERV_COUNT.

summary(honeycomb_with_busstop$'FINSERV_COUNT')
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.000   0.000   1.000   3.613   4.000 130.000 
tmap_options(check.and.fix = TRUE)   
tmap_mode("view")   
tm_shape(mpsz)+
  tm_polygons(alpha=0.3)+    
  tm_shape(honeycomb_with_busstop%>%               
             filter(FINSERV_COUNT>0)) + 
  tm_borders() +      
  tm_fill("FINSERV_COUNT",            
          palette = "Blues",                      
          title = "Financial Service Count",                      
          breaks= c(1,10,20,30,50,70,100,133),                      
          legend.show = TRUE,                      
          alpha=0.7) +      
  tm_view(set.zoom.limits =c(11,14))+      
  tm_layout(main.title = 'Financial Service COUNT on Hexagonal Grid' ,      
            main.title.position = "center",                          
            main.title.size = 1.0,                          
            main.title.fontface = 'bold',                          
            legend.width=1)  
tmap_mode("plot") 

From the map above, similar to f&b, it is observed that CBD area especially Tanjong Pagar to Bugis area exhibits a higher concentration of Financial Services in Singapore.

12.7 Geospatial Data 5 : Leisure and Recreation

This Leisure and Recreation data is curated by Prof. Kam, consisting of 1,217 point coordinates of Leisure and Recreation area like gallery, museum, sport center, park, fitness, playground, studio in Singapore. The shapefile is already in SVY21 form so there is no need to do st_transform().

lei_rec_sf <- st_read(dsn="data/geospatial", layer="Liesure&Recreation")
Reading layer `Liesure&Recreation' from data source 
  `W:\widyayutika\ISSS624\Take-home_Exercise\Take-home_Ex2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 1217 features and 30 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 6010.495 ymin: 25134.28 xmax: 48439.77 ymax: 50078.88
Projected CRS: SVY21 / Singapore TM

Next, I will check for duplicated Leisure and Recreation.

Show the code
duplicate_lei_rec <- lei_rec_sf %>%            
  group_by_all() %>%            
  filter(n()>1) %>%            
  ungroup()

There is no duplicated Leisure and Recreation so next, I will count the number of Leisure and Recreation in each hexagonal grid.

honeycomb_with_busstop$'LEIREC_COUNT' <- lengths(st_intersects(honeycomb_with_busstop, lei_rec_sf))

Next, I will check the distribution of LEIREC_COUNT.

summary(honeycomb_with_busstop$'LEIREC_COUNT')
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.000   0.000   0.000   1.307   1.000  41.000 
tmap_options(check.and.fix = TRUE)    
tmap_mode("view")    
tm_shape(mpsz)+   
  tm_polygons(alpha=0.3)+       
  tm_shape(honeycomb_with_busstop%>%                             
             filter(LEIREC_COUNT>0)) +    
  tm_borders() +         
  tm_fill("LEIREC_COUNT",                       
          palette = "Blues",                                 
          title = "Leisure and Recreation Count",                                 
          breaks= c(1,10,20,30,41),                                 
          legend.show = TRUE,                                 
          alpha=0.7) +         
  tm_view(set.zoom.limits =c(11,14))+         
  tm_layout(main.title = 'Leisure and Recreation Count on Hexagonal Grid' ,        
            main.title.position = "center",                                      
            main.title.size = 1.0,                                       
            main.title.fontface = 'bold',                                       
            legend.width=1)   
tmap_mode("plot") 

From the map above, it is observed that Downtown area exhibits a higher concentration of Leisure and Recreation in Singapore.

12.8 Geospatial Data 6 : Retails

This Retails data is curated by Prof. Kam, consisting of 37,635 point coordinates of Retails area like retail shops in Singapore. The shapefile is already in SVY21 form so there is no need to do st_transform().

retail_sf <- st_read(dsn="data/geospatial", layer="Retails")
Reading layer `Retails' from data source 
  `W:\widyayutika\ISSS624\Take-home_Exercise\Take-home_Ex2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 37635 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 4737.982 ymin: 25171.88 xmax: 48265.04 ymax: 50135.28
Projected CRS: SVY21 / Singapore TM

Next, I will check for duplicated Retails.

Show the code
duplicate_retail <- retail_sf %>%               
  group_by_all() %>%               
  filter(n()>1) %>%               
  ungroup()

Apparently there are 347 duplicate Retails, so I will remove the duplicated records and keep the first occurrence. So, in total there are 37,460 Retails.

retail_sf<- unique(retail_sf)

Next, I will count the number of Retails in each hexagonal grid.

honeycomb_with_busstop$'RETAILS_COUNT' <- lengths(st_intersects(honeycomb_with_busstop, retail_sf))

Next, I will check the distribution of RETAILS_COUNT.

summary(honeycomb_with_busstop$'RETAILS_COUNT')
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00    1.00    8.00   43.83   36.00 1669.00 
tmap_options(check.and.fix = TRUE)     
tmap_mode("view")     
tm_shape(mpsz)+      
  tm_polygons(alpha=0.3)+          
  tm_shape(honeycomb_with_busstop%>%                                           filter(RETAILS_COUNT>0)) +       
  tm_borders() +            
  tm_fill("RETAILS_COUNT",                                  
          palette = "Blues",                                            
          title = "Retails  Count",                                            breaks= c(1,100,500,1000,1500,1669),                                            
          legend.show = TRUE,                                            
          alpha=0.7) +            
  tm_view(set.zoom.limits =c(11,14))+            
  tm_layout(main.title = 'Retails Count on Hexagonal Grid' ,   
            main.title.position = "center",                                       
            main.title.size = 1.0,                                              
            main.title.fontface = 'bold',                                         
            legend.width=1)    
tmap_mode("plot") 

From the map above, it is observed that Bugis area exhibits a higher concentration of Retails in Singapore.

12.9 Geospatial Data 7 : Train Station

This Train Station data is downloaded from LTA DataMall, consisting of 37,635 point coordinates of Retails area like retail shops in Singapore. The shapefile is already in SVY21 form so there is no need to do st_transform().

train_station_sf <- st_read(dsn="data/geospatial", layer="RapidTransitSystemStation") %>%
  st_transform(crs = 3414)
Reading layer `RapidTransitSystemStation' from data source 
  `W:\widyayutika\ISSS624\Take-home_Exercise\Take-home_Ex2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 220 features and 4 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 6068.209 ymin: 27478.44 xmax: 45377.5 ymax: 47913.58
Projected CRS: SVY21
st_crs(train_station_sf)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

12.10 Exclude Depot and Facility Center from the dataset

filtered_train_station_sf <- train_station_sf %>%
  filter(grepl("STATION", STN_NAM_DE, ignore.case = TRUE))
honeycomb_grid_sf$'TRAIN_STATION_COUNT' <- lengths(st_intersects(honeycomb_grid_sf, filtered_train_station_sf))
summary(honeycomb_grid_sf$'TRAIN_STATION_COUNT')
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.0000  0.0000  0.0000  0.1177  0.0000  6.0000 
tmap_mode("plot")
tmap_options(check.and.fix = TRUE) 
tm_shape(mpsz)+
  tm_polygons()+
tm_shape(honeycomb_grid_sf %>% 
           filter(TRAIN_STATION_COUNT>0)) +      
  tm_polygons(col="TRAIN_STATION_COUNT", alpha=0.6, breaks=c(1,3,5,6)) +    
  #tm_view(set.zoom.limits =c(11,14))  
tmap_mode("plot")

12.11 Geospatial Data 8: MRT Station EXIT

This MRT Station Exit data is downloaded from LTA DataMall, consisting of 565 point coordinates of MRT Exits in Singapore. Then, I use st_transform of sp package to convert coordinates to EPSG code of 3414 for SVY21 (projected coordinate system for Singapore).

MRT_exit_sf <- st_read(dsn="data/geospatial", layer="Train_Station_Exit_Layer") %>%
  st_transform(crs = 3414)
Reading layer `Train_Station_Exit_Layer' from data source 
  `W:\widyayutika\ISSS624\Take-home_Exercise\Take-home_Ex2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 565 features and 2 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 6134.086 ymin: 27499.7 xmax: 45356.36 ymax: 47865.92
Projected CRS: SVY21

Next, I will check for duplicated MRT Exits.

Show the code
duplicate_MRT_exit <- MRT_exit_sf %>%                  
  group_by(stn_name,exit_code) %>%                  
  filter(n()>1) %>%                  
  ungroup()

Apparently there are 17 duplicate MRT Exits, so I will remove the duplicated records and keep the first occurrence. So, in total there are 556 MRT Exits.

MRT_exit_sf<- MRT_exit_sf %>%
  distinct(stn_name, exit_code, .keep_all=TRUE)

Next, I will count the number of MRT Exits in each hexagonal grid.

honeycomb_with_busstop$'MRT_EXIT_COUNT' <- lengths(st_intersects(honeycomb_with_busstop, MRT_exit_sf))

Next, I will check the distribution of MRT Exits.

summary(honeycomb_with_busstop$'MRT_EXIT_COUNT')
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.0000  0.0000  0.0000  0.6619  0.0000 13.0000 
tmap_options(check.and.fix = TRUE)     
tmap_mode("view")     
tm_shape(mpsz)+      
  tm_polygons(alpha=0.3)+          
  tm_shape(honeycomb_with_busstop%>%                                           filter(MRT_EXIT_COUNT>0)) +       
  tm_borders() +            
  tm_fill("MRT_EXIT_COUNT",                                  
          palette = "Blues",                                            
          title = "MRT Exit Count",                                            breaks= c(1,4,7,10,13),                                            
          legend.show = TRUE,                                            
          alpha=0.7) +            
  tm_view(set.zoom.limits =c(11,14))+            
  tm_layout(main.title = 'MRT Exits Count on Hexagonal Grid' ,   
            main.title.position = "center",                                       
            main.title.size = 1.0,                                              
            main.title.fontface = 'bold',                                         
            legend.width=1)    
tmap_mode("plot") 

From the map above, it is observed that Downtown area exhibits a higher concentration of MRT Exits with a maximum of 13.

13 Flow_Data_Tidy

In this section, I will append all propulsive(origin) and attraction(destination) variables to my flow_data1 and create new df flow_data_tidy.

First, I will convert grid_id of honeycomb_with_busstop to factor format to match with the “grid_id” of flow_data1

honeycomb_with_busstop$grid_id <- as.factor(honeycomb_with_busstop$grid_id)

I will tidy up flow_data1.

flow_data1 <- flow_data1 %>%
  select(c("ORIGIN_GRID_ID","ORIGIN_DESC","DESTIN_GRID_ID","DESTIN_DESC",
           "MORNING_PEAK", "dist"))
flow_data_tidy <- flow_data1 %>%
  left_join(honeycomb_with_busstop,
            by = c('ORIGIN_GRID_ID' = 'grid_id')) %>%
  rename(ORIGIN_SCHOOL_COUNT=SCHOOL_COUNT,
         ORIGIN_HDB_DWELLING_UNIT=HDB_DWELLING_UNIT,
         ORIGIN_BUSINESS_COUNT=BUSINESS_COUNT,
         ORIGIN_ENTERTAINMENT_COUNT=ENTERTAINMENT_COUNT,
         ORIGIN_FNB_COUNT=FNB_COUNT,
         ORIGIN_FINSERV_COUNT=FINSERV_COUNT,
         ORIGIN_LEIREC_COUNT=LEIREC_COUNT,
         ORIGIN_RETAILS_COUNT=RETAILS_COUNT,
         ORIGIN_MRT_EXIT_COUNT=MRT_EXIT_COUNT)%>%
  select(-c(busstop_count,area_honeycomb_grid))

flow_data_tidy <- flow_data_tidy %>%
  left_join(honeycomb_with_busstop,
            by = c('DESTIN_GRID_ID' = 'grid_id')) %>%
  rename(DESTIN_SCHOOL_COUNT=SCHOOL_COUNT,
         DESTIN_HDB_DWELLING_UNIT=HDB_DWELLING_UNIT,
         DESTIN_BUSINESS_COUNT=BUSINESS_COUNT,
         DESTIN_ENTERTAINMENT_COUNT=ENTERTAINMENT_COUNT,
         DESTIN_FNB_COUNT=FNB_COUNT,
         DESTIN_FINSERV_COUNT=FINSERV_COUNT,
         DESTIN_LEIREC_COUNT=LEIREC_COUNT,
         DESTIN_RETAILS_COUNT=RETAILS_COUNT,
         DESTIN_MRT_EXIT_COUNT=MRT_EXIT_COUNT)%>%
  select(-c(busstop_count,area_honeycomb_grid))

I will write flow_data_tidy into local disk.

write_rds(flow_data_tidy,"data/rds/flow_data_tidy.rds")

Read the flow_data_tidy back into R

flow_data_tidy <- read_rds("data/rds/flow_data_tidy.rds")

14 Calibrating Spatial Interaction Models

Data Exploration on Flow_data

Distribution of trips

Trips vs distance

log(trips) vs log(dist)

14.1 Poisson Regression

15 Reference

  1. https://www.statology.org/long-vs-wide-data/